Solving Infinite Horizon Discounted Markov Decision Process Problems for a Range of Discount Factors

نویسندگان

  • D. J
  • D. J. WHITE
چکیده

In this paper we will assume the following framework. There is a finite state set I, with i E Z as its generic member, 1 < i < m. For each ie Z, there is a finite action set K(i), with k E K(i) as its generic member. For each i E Z, k E K(i), there is a transition probability, p(i, j, k), that if at a decision epoch the state is i E Z, and if action k E K(i) is taken, then the state will be Jo Z at the next decision epoch, and there is an immediate reward, r(i, k), with 0 < r(i, k) < M < co, there is a discount factor r in the interval [0, p], for some fixed p < 1. We will be interested in values of z within this range, and we will parameterize r by a parameter t E [0, 11, so that r = TV. The actual values of t to be studied will depend upon the questions we may wish to answer and how these might efficiently be answered. A pure Markov decision rule is a function 6: I+ K = lJie, K(i), where if i E Z then 6(i) E K(i). We will be concerned with maximizing the infinite horizon discounted rewards, and we do not need to consider either time dependent decision rules, or those which are a function of the complete history of the process up to a specified decision epoch or decision rules which select an action with a specified probability (see van der Wal [4]). A general policy n is an infinite sequence of decision rules which determine an action at each decision epoch, as a function of the past history, with some probability. In the light of the previous remark we need only consider policies of the form 7c = (~5)~, where 6 E A, the set of pure Markov decision rules, and (~3)~ simply means the application of 6 an infinite number of times. If o,(i) is the maximum infinite horizon expected discounted reward, with 303 0022-247X/89 $3.00

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies

This paper studies convergence properties of optimal values and actions for discounted and averagecost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic-review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs...

متن کامل

Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes

We analyze losses resulting from uncertain transition probabilities in Markov decision processes with bounded nonnegative rewards. We assume that policies are pre-computed using exact dynamic programming with the estimated transition probabilities, but the system evolves according to different, true transition probabilities. Our approach analyzes the growth of errors incurred by stepping backwa...

متن کامل

On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems

We investigate the computability of problems in probabilistic planning and partially observable infinite-horizon Markov decision processes. The undecidability of the string-existence problem for probabilistic finite automata is adapted to show that the following problem of plan existence in probabilistic planning is undecidable: given a probabilistic planning problem, determine whether there ex...

متن کامل

Sensitivity of Constrained Markov Decision Processes

We consider the optimization of nite-state, nite-action Markov Decision processes, under constraints. Costs and constraints are of the discounted or average type, and possibly nite-horizon. We investigate the sensitivity of the optimal cost and optimal policy to changes in various parameters. We relate several optimization problems to a generic Linear Program, through which we investigate sensi...

متن کامل

Recent Results in Controlled Markov Chains with Risk Sensitive Average Criteria: the Vanishing Discount Approach

Countable state space Markov cost/ reward chains, satisfying a Lyapunov-t ype stability condition, are considered in this work. For an infinite planning horizon, risk sensitive (exponential) discounted and average cost criteria are considered. The main contribution is the development of a vanishing discount approach to relate the discounted criterion problem with the average criterion one, as t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003